A fusion approach using GIS, green area detection, weather API and GPT for satellite image based fertile land discovery and crop suitability

Proper utilization of agricultural land is a big challenge as they often laid over as waste lands. Farming is a significant occupation in any country and improving it further by promoting more farming opportunities will take the country towards making a huge leap forward. The issue in achieving this would be the lack of knowledge of cultivable land for food crops. The objective of this work is to utilize modern computer vision technology to identify and map cultivable land for agricultural needs. With increasing population and demand for food, improving the farming sector is crucial. However, the challenge lies in the lack of suitable land for food crops cultivation. To tackle this issue, we propose to use sophisticated image processing techniques on satellite images of the land to determine the regions that are capable of growing food crops. The solution architecture includes enhancement of satellite imagery using sophisticated pan sharpening techniques, notably the Brovey transformation, aiming to transform dull satellite images into sharper versions, thereby improving the overall quality and interpretability of the visual data. Making use of the weather data on the location observed and taking into factors like the soil moisture, weather, humidity, wind, sunlight times and so on, this data is fed into a generative pre-trained transformer model which makes use of it and gives a set of crops that are suitable to be grown on this piece of land under the said conditions. The results obtained by the proposed fusion approach is compared with the dataset provided by the government for different states in India and the performance was measured. We achieved an accuracy of 80% considering the crop suggested by our model and the predominant crop of the region. Also, the classification report detailing the performance of the proposed model is presented.

• Formulate a methodological framework aimed at significantly enhancing the quality of satellite imagery from publicly available resources such as Apple Maps or Google Maps.• Design an algorithm to identify and quantify green areas within the images, calculating the acreage based on the image scale.• Extract geospatial attributes closely associated with agricultural effects, including soil characteristics, humid- ity, rainfall, and weather parameters.• Utilize the artificial generative intelligence model to provide crop suggestions based on the processed data.
• Improve image processing for satellite images through geospatial attributes and convert them into an infor- mation extraction worthy image.

Literature review
Several remarkable research studies have been carried out in the field of GIS-based agriculture, which have greatly influenced the development of this paper.Dornich 6 presents a comprehensive overview of GIS in agriculture, emphasising its role in spatial analyses and decision-making through visual data representation.Singh et al. 7 utilise the Analytic Hierarchy Process (AHP) and geospatial techniques to evaluate the suitability of cereal crops in India, with the conclusion that AHP proves effective for such analyses.CIBO Technologies elucidates the importance of remote sensing in agriculture, enabling data-driven interventions to enhance crop yield.Similarly, Moreno-Armendáriz et al. 8 and Nguyen et al. 9 employ advanced deep learning algorithms and drone imagery 10,11 for urban green space analysis, greatly assisting urban planning endeavours.Weatherbit 12 provides an Agriculture Weather Forecast API, furnishing 8-day weather forecasts specifically tailored for the agriculture industry, thereby supporting informed decision-making by farmers.Rossi et al. 13 employ a GIS-based geopedological approach to evaluate land suitability for chestnut groves, effectively demonstrating the role of GIS in promoting sustainable rural development.Siche et al. 14 explores the multifarious applications of OpenAI and ChatGPT language models in agriculture, whereas Adamchuk et al. 15 discuss the fundamental principles of agricultural remote sensing and its immense potential for agricultural producers.Mahato et al. 16 utilise machine learning algorithms to accurately detect and assess green belt areas, while GIS Stack Exchange proposes an efficient method for identifying urban green areas in Google Earth imagery using R, a the popular programming language.Stormglass 17 provides an Agritech Weather API, delivering highly detailed weather forecasts specifically designed for the agricultural sector.Lastly, Kumar 18 integrate GIS and machine learning techniques to predict cropland suitability, underscoring the significance of sustainable agriculture.Collectively, these references underscore the immense potential and practical

Proposed system
The structural architecture of the system (Fig. 1) is a unison of different modules working together, it includes from the image procurement to the processing involved in it to enhance it to further make predictions and suitability consulting easier.

Image collection
To procure the satellite imagery of the region, Apple Maps served as our primary resource.The script employed to compute the area relied extensively on a scaling factor, which was meticulously achieved by zooming into the map until it reached a scale distance of 126 yards/cm.The utilisation of Apple Maps, with its esteemed precision and reliability, proved instrumental in our pursuit of acquiring the satellite imagery necessary for our research, which is then fed into the interface we designed to make the application more user friendly with hopes of launching it as a standalone application as a future work.

Image processing
The raw image of the area acquired is then passed through an image processing layer that makes it easier to extract the features present in the image, a well known satellite imagery based technique known as the Panchromatic Sharpening 27 is used to sharpen the dull satellite images of various areas, knowing that major satellite view vendors often miss out on providing sharp image details 28 to the areas of lesser search traffics, it helps in the sub processing layer to make the satellite images more clearer.
When it comes to Panchromatic Sharpening 29 of satellite images, there are usually two methods that are prominent and viable for this application, it includes the generative adversarial network approach (GAN) as depicted in Fig. 2 and a deep network based architecture called the RES-net/PanNet (Fig. 3).Sharpening the images help greatly to make the colours stand out more and have an easier masking procedure to extract the areas out of it.
The boundaries of the buildings are more apparent, which helps to avoid any masking over the buildings and miscalculating the area that is to be found.

Conversion of the multispectral image to grayscale
To begin with, the multispectral image is converted to grayscale.This is typically done by taking a weighted average of the red, green, and blue channels, where the weights are determined based on the sensitivity of the human eye to different colours.

Resampling the panchromatic image
The panchromatic image, which has a higher spatial resolution, is resampled to match the dimensions of the grayscale multispectral image.This is done to ensure that both images have the same pixel dimensions.

High-pass filtering
This involves convolving the image with a high-pass filter kernel, such as a Laplacian or a Difference of Gaussian (DoG) filter.The high-pass filter enhances the edges and details in the image.

Fusion of images
This fusion process is often achieved through a process known as intensity modulation, where the pixel values of the panchromatic image are multiplied with the corresponding pixel values of the grayscale image.We use panchromatic sharpening as its which is quite simple but very effective in our use case as it greatly justifies the quality of the map image.Equation for panchromatic sharpening is given by: • P be the resampled panchromatic image, • F(P) be the result of applying a high-pass filter to P, and • λ be a weighting factor.
We are utilising a technique called the Brovey Transformation, which is a fusion-based method commonly employed for combining multiple data sources.It employs a ratio-based algorithm to merge images, assuming that the spectral range of the panchromatic (PAN) image matches that captured by the multispectral (MS) bands.The aim of this transformation is to generate RGB images, merging only three bands at a time.In our approach, we begin by inputting the RGB image, converting it into HSV-based images to obtain a sharper version, and then converting them back using the fusion technique.The Brovey Transformation is mathematically defined by the following equation: The symbols used in Eq. ( 2) represent the following: • X k (i,j): Refers to the data of the k th original multispectral (MS) band, with i representing the pixel number and j representing the line number of that specific band.• Y k (i,j): Represents the data of the fused multispectral (MS) band, which is the output of the fusion process.
Similar to X k (i,j), i and j denote the pixel and line numbers of the fused band.• X p (m,n): Denotes the original panchromatic (PAN) band data, where m represents the pixel number and n represents the line number of the PAN band.
(1) The observed outcomes remain visually consistent across all three resampling methods, resulting in a substantial improvement in resolution with smaller pixel values.Notably, there is a noticeable presence of colour distortion, which varies depending on the combinations of bands used as shown in Fig. 4.
From the comparison among other techniques in Table 1, we found out that the Brovey transformation method is quite efficient, although the Principal Component Analysis is almost the same, it is often compute expensive as compared to this with roughly the same results to be noticed in both of them.

Colour masking, segmentation and area calculation
To extract the green area from a map image, a thresholding technique can be applied after converting the image from the RGB colour space to the HSV colour space.This approach allows for capturing varying intensities of green more effectively compared to using a filter in an RGB setup.The HSV colour space separates the colour information into three components: Hue, Saturation, and Value.In this case, we are particularly interested in the Hue component, as it represents the dominant colour information.The steps involved in this process are as follows: 1. Convert the image from RGB to HSV colour space, this conversion is performed to represent the image in terms of Hue, Saturation, and Value.Each pixel's RGB values are transformed into corresponding HSV values using Eq.(3).  2. Thresholding based on Hue values, define a lower threshold and an upper threshold to identify the desired green spectrum.These thresholds should encompass the range of green hues you want to extract.For example, a lower threshold could be set to a lighter green, while the upper threshold could be set to a darker green.These thresholds depend on the specific shades of green you are interested in.3. Apply the thresholding operation, compare the Hue values of each pixel in the HSV image with the defined lower and upper thresholds.If the Hue value falls within this range, set the pixel value to white (indicating the green area); otherwise, set it to black (indicating non-green areas).4. Obtain the green area, the resulting thresholded image will have white regions where the green areas are detected and black regions for non-green areas.The white regions represent the extracted green area from the map image.
RGB to HSV Conversion equations are given as follows: • Color cone  The HSV image is then extracted and the area is calculated based on the number of acres per centimetre of the scaled image which is fixed to a scale of 126 yards/cm.

Getting geospatial characteristics
In order to gather geospatial attributes about a particular location, we employ the use of X and Y APIs.These APIs provide us with valuable data pertaining to botanical growth factors such as soil moisture, temperature, weather attributes, humidity, and soil characteristics, among others.By utilising this information, we establish a foundation for our GPT (Generative Pre-trained Transformer) model to acquire knowledge regarding the area's weather and soil conditions.These geospatial attributes are collected and inputted into the GPT model.The user simply uses the interface and types the name of the location, on submitting, the API calls are made and the data is procured.

Feeding the geospatial characteristics into the generative pre-trained transformer
The data collected by the API pipeline is then passed through the transformer model, specifically the davincii-002 variant, which enables us to predict potential crop outcomes feasible for cultivation in the given location under the specified conditions.To ensure accuracy, we cross-reference our findings with the Crop Weather Calendar 30 , a comprehensive dataset compiled by the Indian Council of Agricultural Research, encompassing information about various crops and their respective growth conditions.Table 2 provides information of corp wise CWC from All India Coordinated Research Project on Agrometeorology (AICRPAM) centres.
Notably, the crops identified within the research area as generated by the GPT model align with the ones documented in the Crop Weather Calendar.Moreover, the model also suggests additional crop varieties that have not been explored in the research but are deemed viable based on the provided conditions.These novel crops represent possibilities for cultivation in the area and hold potential without encountering significant obstacles.
The primary objective of this research is to identify suitable crops that can be grown on a specific piece of land, considering the prevailing conditions.Furthermore, we aim to discover new crop varieties that have not been previously experimented with.Instead of relying on traditional trial and error methods, we leverage the advancements of artificial intelligence to predict outcomes based on learned patterns and outcomes.
The transformer architecture (Fig. 5) comprises an encoder and a decoder.It excels in modelling long-range dependencies and parallel computations.With self-attention as its crown jewel, transformers have become the cornerstone of language models like GPT and BERT, enabling enhanced language understanding and generation capabilities.Residual connections and layer normalisation fortify training, propelling transformers to surpass recurrent neural networks.In essence, transformers usher in a new era of linguistic finesse and innovation.
The generative capabilities of these models can be used to reason out what crops to grow at a given place using the attributes fed into it, thanks to the countless amounts of data and hours spent training this model with tons of information.
There are various models that exist within, the one used mainly in this research is the GPT-4 model, which is the cutting edge model as of this writing.The workflow of the model consists of classification, entailments, similarity and decision of multiple choices.
Once the encoding process is over, the task-specific layers are employed based on following objectives as depicted in Fig. 6: (3) www.nature.com/scientificreports/ 1. Classification: For classification tasks, such as crop analysis or crop categorization, a classification layer is added on top of the encoder.This layer maps the contextualised representations of the input sequence to the appropriate class labels.2. Entailment: The goal here is to determine the logical relation between the two data points from the API, whether one attribute entails or contradicts the other, is neutral with respect to another etc.  www.nature.com/scientificreports/ 3. Similarity: It involves measuring semantics between the data points, how they affect each other, scoring mechanisms are used to find the similarities.4. Multiple-Choice Decision: This involves scenarios where multiple crops can be grown, each choice is processed separately through an encoder and task-specific layers, the representation is then compared or scored against each other to make the final outcome.
A baseline Large Language Model (LLM) assumes the role of the foundational iteration, devoid of alterations.It dutifully acquaints itself with language patterns derived from copious data sources.Conversely, an augmented LLM transcends the ordinary.Bolstered by grander dimensions, domain-specific training, and supplementary data, it adroitly captures the subtleties of language as depicted in Fig. 7. Purposeful modifications, tailored to specific tasks, refine its performance, thus endowing augmented models with unparalleled proficiency in comprehending and crafting linguistic expressions.

Decision making by GPT-4 model by considering the input attributes
The decision process of GPT-4 (Fig. 8) while deciding which crops to suggest is done by analysing the weather conditions, starting with the temperature.If the temperature exceeds 30 °C, the model takes into account the humidity level.When the humidity is low, it suggests crops like Groundnut, Sunflower, Millets, and Cotton.On the other hand, if the humidity is high, it recommends crops such as Rice and Sugarcane.In the case of moderate temperatures (20-30 °C), the model considers the humidity level once again.It suggests crops like Rice, Ragi (Finger Millet), Maize, Pulses such as pigeon pea (Toor Dal) and chickpea, vegetables like tomato, brinjal, and okra, and fruits like mango, banana, and guava.It also suggests considering Coffee and Tea in appropriate hilly regions.
Afterward, it takes into account the wind speed.If the wind speed is high, it proposes selecting crops with strong root systems and wind-resistant characteristics.Conversely, if the wind speed is low, a broader range of crops can be considered.It then evaluates the precipitation levels.When there is high precipitation, it recommends crops like Rice and Sugarcane that require more water.However, when precipitation is low, it suggests considering drought-resistant crops such as Millets, Groundnut, and Sunflower.
Additionally, it considers the UV index.In the event of a high UV index, it suggests opting for crops that can withstand high levels of UV radiation.On the contrary, when the UV index is low, a wider variety of crops can be considered.Once the weather conditions have been assessed, the model moves on to examine the soil conditions, specifically the soil moisture.If the soil moisture is high, it suggests crops that require more water, including Rice, Sugarcane, and fruits like mango, guava, and banana.Conversely, if the soil moisture is low, it recommends considering drought-resistant crops such as Millets, Groundnut, and Sunflower.www.nature.com/scientificreports/

Experimental results
Figure 9 illustrates the results of the image mapping process, with the primary objective of removing extraneous elements like buildings and roads from the scene, thus honing the focus exclusively on the green areas of interest.This feat is accomplished through the adept application of a negation mask, which skillfully extracts the upper and lower bounds of the green hue, subsequently permitting the subtraction of these precise regions from the original captured image.Consequently, this method deftly isolates and accentuates the green regions within the image, elegantly aligning with the intended focus of concern.
The filtered panchromatic image is then combined with the grayscale multispectral image to create a sharpened image.The resampled panchromatic image is then subjected to a high-pass filtering operation.This significantly sharpens the dull satellite images that are available to use and makes edge detection for buildings and roads more prominent, thereby, making us achieve maximum accuracy in the image processing section to calculate the acres of land available as green areas to be cultivated (Fig. 10).
Figure 11 portrays the win rate scores of 13b models given by GPT-4, correlates with the average number of unique tokens in the responses, the pearson constant, r is 0.96.
The training time of these models is a factor that affects the responses greatly, however, the perplexity, although doubled, won't really affect the models despite having half the training size (Fig. 12).
To validate the test cases, we applied our model into multiple districts across India and checked the results with the Crop Weather Calendar.In pretty much all the cases, our model successfully predicted the predominant crop for the respective districts and proved what it's capable of.Although this is ideal, sometimes it was found that the predominant crop was not ranked at the top, rather, at the second or third position; despite this being a minor variation to look for as ultimately the crops shown are the ones that can be grown, the reason for this is attributed to various factors such as local variations in the state region as the dataset is focused entirely on a certain part of the state; another factor could be the limitation of the model's training but it is a very slim margin on that.Regardless of what it may be, the model consistently provided accurate predictions for the predominant crop, aligning with the guidelines set by the CWC.www.nature.com/scientificreports/ In the confusion matrix (Fig. 13), each cell represents the count of predictions for a specific combination of true label and predicted label.For example, the value 7 in the cell corresponding to "Rice" as the true label and "Rice" as the predicted label indicates that there were 7 instances where the model correctly predicted rice as the predominant crop.Results of the evaluation metrics are illustrated in Table 3. Equations for the Evaluation Metrics used are follows: 4 represents the comparison of the results predicted by proposed system with the dominant crop provided in the dataset.The 0's represent the instances where the dominant crop isn't in the first position among the predicted crops; and the 1's represent the converse.
For example, in District A, rice was identified as the predominant crop by our model, which coincided with the recommendations provided by the CWC.The employed satellite image processing techniques involved extracting vegetation indices, analysing soil moisture levels, temperature patterns, wind speed, and humidity.By combining these attributes, our model effectively assessed the suitability of rice cultivation in this district.www.nature.com/scientificreports/information empowers farmers to exercise greater control over their crop selection, facilitating the diversification of their crop portfolio and enabling them to make well-informed decisions regarding suitable crops specific to their region.This aspect grants farmers the opportunity to explore novel crop alternatives, adapt to changing market demands, and potentially enhance their profitability.It can be observed from Table 3 that the classification report gave good precision, recall and F1-score for most of the crops.Also, Table 4 asserts that out of 20 crops considered under study, the proposed model prediction matched the dominant crop as provided in the dataset 15 times underlining the consistency is predicting the right crop for the region.
Through the integration of sophisticated image processing techniques and leveraging the capabilities of our GPT-based model, we provide farmers with a comprehensive analysis of their agricultural lands.The supplementary crop suggestions furnished by our model present farmers with a wider array of choices, augmenting their decision-making process.This not only fosters a greater sense of autonomy for farmers in selecting the crops they wish to cultivate but also assists them in making informed choices regarding crops that are best suited for their particular region.Consequently, this approach promotes sustainable farming practices and cultivates a culture of agricultural innovation.

Conclusion and future work
In this research, we have proposed a GIS-based system that leverages satellite images, green area detection, weather APIs, and GPT crop suitability analysis to facilitate the discovery of fertile land and provide accurate crop recommendations.Our unique approach combines location-based analysis with the integration of soil and climate attributes, enabling us to offer tailored suggestions for crop cultivation.Additionally, our system possesses the capability to utilise predicted climate data, allowing us to provide recommendations for future dates.One of the key differentiating factors of our system is the utilisation of location information.By incorporating location data, we can pinpoint specific regions and analyse the soil and climate attributes of those areas.Satellite imagery, a cornerstone of this research, provides a panoramic view of extensive areas, enabling comprehensive analysis of different regions.Leveraging pre-processing techniques, such as image enhancement and feature extraction, allows for the extraction of valuable attributes from satellite images.The proposed model reported a good precision, recall and F1-score for most of the crops.Also, the prediction of dominant crop by the model matched with the dominant crop provided in the dataset in 15 out of 20 instances.By empowering farmers with data-driven insights, the proposed system aims to reduce reliance on intuition and generational knowledge, paving the way for informed decisions, improved crop yields, resource efficiency, and overall sustainability in agriculture.The research signifies a bold step towards the future of agriculture, where cutting-edge technology plays a pivotal role in shaping efficient and sustainable farming practices.Future work will be towards enhancing the accuracy of identifying cultivable land.Currently, we utilise a basic green area detection mechanism, which may include non-agricultural spaces such as buildings, small parks, or backyards.To address this, we can explore machine learning algorithms to differentiate between suitable agricultural land and non-cultivable areas.By training the model on a diverse range of agricultural data and incorporating domain-specific knowledge, we can enhance its ability to provide more accurate and contextually relevant recommendations.By expanding the system to include additional factors such as market demand, economic viability, and crop rotation can provide farmers with a more comprehensive and holistic decision-making tool.

Figure 1 .
Figure 1.Architecture diagram for proposed satellite image processing.

Figure 5 .
Figure 5.A basic transformer model architecture.

Figure 6 .
Figure 6.Decision process of the transformer model.

Figure 7 .
Figure 7. Workflow of the transformer model.

Figure 13 .
Figure 13.Confusion matrix of crop suitability analysis.

•
Let M be the grayscale multispectral image,

Table 1 .
Comparison of other fusion techniques in comparison with the Brovey transformation technique.